Spectral Clustering for Microsoft Netscan Data
نویسندگان
چکیده
We present the results of exploratory data analysis for a data set that consists of crossposting information for 89,687 newsgroups over a period of 3.4 years. The data set we use is a part of Microsoft Netscan data. Our goal is to investigate the community structure of the newsgroup data set with a specific focus on spectral hierarchical clustering. We present a spectral hierarchical clustering algorithm and discuss existing and novel ways to measure the quality of a hierarchical clustering. We construct spectral hierarchical clusterings for ten subsets of the data set and evaluate the stability of the results.
منابع مشابه
Clustering Algorithm for Network Constraint Trajectories
Spatial data mining is an active topic in spatial databases. This paper proposes a new clustering method for moving object trajectories databases. It applies specifically to trajectories that only lie on a predefined network. The proposed algorithm (NETSCAN) is inspired from the wellknown density based algorithms. However, it takes advantage of the network constraint to estimate the object dens...
متن کاملClustering Spectral Filters for Extensible Feature Extraction in Musical Instrument Classification
We propose a technique of training models for feature extraction using prior expectation of regions of importance in an instrument’s timbre. Over a dataset of training examples, we extract significant spectral peaks, calculate their ratio to fundamental frequency, and use kmeans clustering to identify a set of windows of spectral prominence for each instrument. These windows are used to extract...
متن کاملRestricted Boltzmann Machines with Gaussian Visible Units Guided by Pairwise Constraints
Restricted Boltzmann machines (RBMs) and their variants are usually trained by contrastive divergence (CD) learning, but the training procedure is an unsupervised learning approach, without any guidances of the background knowledge. To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints restricted Boltzmann machine with Gaussian visible units (pcGR...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملSimultaneous spectral analysis of multiple video sequence data for LWIR gas plumes
We consider the challenge of detection of chemical plumes in hyperspectral image data. Segmentation of gas is difficult due to the diffusive nature of the cloud. The use of hyperspectral imagery provides non-visual data for this problem, allowing for the utilization of a richer array of sensing information. We consider several videos of different gases taken with the same background scene. We i...
متن کامل